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Sequence determines degree of knottedness in a coarse-grained protein model 
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Knots are abundant in globular homopolymers but rare in globular proteins. To shed new light 
on this long-standing conundrum, we study the influence of sequence on the formation of knots in 
proteins under native conditions within the framework of the hydrophobic-polar (HP) lattice protein 
model. By employing large scale Wang-Landau simulations combined with suitable Monte Carlo 
trial moves we show that, even though knots are still abundant on average, sequence introduces 
large variability in the degree of self-entanglements. Moreover, we are able to design sequences 
which are either almost always or almost never knotted. Our findings serve as proof of concept 
that the introduction of just one additional degree of freedom per monomer (in our case sequence) 
facilitates evolution towards a protein universe in which knots are rare. 

PACS numbers: 87.14.et, 87.10.Rt, 02.10.Kn, 87.15.Cc 


Knots have fascinated physicists, mathematicians and 
chemists for a long time. About 140 years ago, Kelvin 
hypothesized that atoms consist of knots in the aether 
P]. At first sight, this beautiful idea is quite appealing 
as knots are, in a sense, unique and just like atoms can¬ 
not change their type: Without breaking bonds a sim¬ 
ple unknotted ring (a so-called unknot) cannot be, e. g., 
transformed into a trefoil knot (3i, with three minimal 
crossings in a projection onto a plane). But as this aes¬ 
thetically pleasing model was finally rejected most of the 
initial enthusiasm among natural scientists faded, and 
knot theory became truly a part of mathematical sci¬ 
ences. In recent decades, however, the field went through 
a renaissance spurred by the discovery of knots in DNA 
Eli and proteins 0®. 

Knotted proteins in particular pose a number of chal¬ 
lenges which are not overcome easily and question our 
understanding of evolution and folding - especially when 
we keep in mind that the function of a protein is de¬ 
termined by its three-dimensional structure. Only eleven 
folds are known to be knotted (one of which has been cre¬ 
ated artificially) and most of these knots are simple tre¬ 
foils |6]. There is also one protein knot with five crossings 
which incidentally makes up 1-2 % of our brain protein 
mass, (pdb-code:2etl) [ 7 ], and there is even a knot with 
six crossings (pdb-code:3bjx) [8]. Indeed, it is difficult to 
imagine how such proteins always fold into their knotted 
native state |9j. A number of experiments have shown 
that certain knotted proteins can refold to the knotted 
state upon degradation m and that the process can be 
accelerated by chaperons HU. From a topological point 
of view folding may not always be as difficult as it ap¬ 
pears in the first place though, as even complicated knots 
(e. g., the 6i knot mentioned above) can be generated 
from an unknotted state by a single global movement of 
a subchain as shown by coarse-grained folding simula¬ 
tions with Go-models jS]. 

The apparent rarity of knotted proteins is in stark con¬ 
trast to the abundance of knots in globular polymers Ha¬ 


ng. Even though proteins are not archetypal homopoly¬ 
mers of the bead-spring type, this discrepancy is never¬ 
theless remarkable. Indeed, there are several competing 
(and even complementing) ideas why knots are rare. Tay¬ 
lor and Lin m pointed out that proteins should rather 
be compared to a chain of “sticky beads” - a visualiza¬ 
tion of an old idea H3: The protein essentially folds from 
an unknotted swollen state and remains in an unknotted 
(“crumbled”) globular state which results from the initial 
collapse. From a structural point of view the emergence 
of secondary structure also changes the length-scale at 
which knots occur and likely decreases their probability 
of occurrence. A first systematic study in this context 
was undertaken by Lua and Grosberg [18]; they com¬ 
pared the scaling of subchains between real proteins and 
compact lattice loops and found that, statistically, pro¬ 
teins tend to “fold back on themselves” at intermediate 
scales up to 40 amino acids which may act as a strong 
suppressor of knotting. To which extent this is a result 
of evolution working towards the suppression of knots (as 
they may be adverse to folding or function) is still largely 
unknown. 

In this letter we focus on how such mechanisms may 
have evolved in the first place. Consider a statistical en¬ 
semble of (potentially highly knotted) globular proteins 
made up of random amino acids with a certain degree of 
variability. Natural selection has led to a “Protein Uni¬ 
verse” USED] significantly different from the statistical 
average of our random amino acid chains - apparently 
full of purpose and function and with little or no knots. 

Within the framework of a minimalist protein model, 
the hydrophobic-polar (HP) lattice model [2TH23] . we 
show that a single additional degree of freedom per 
monomer, namely sequence, may provide an evolution¬ 
ary pathway which allows proteins to evolve towards a 
“lattice protein universe” which is almost void of knots. 
We are able to design sequences and identify patterns, 
which suppress or enhance the formation of knots in our 
lattice model. However, due to the coarse-grained nature 
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of the lattice, these sequences are typically not the same 
as in real proteins which are considerably more complex. 

In the HP model the protein is represented as a self¬ 
avoiding chain of beads (the amino acid residues) on a 
regular lattice (here, simple cubic). There are only two 
classes of amino acids, hydrophobic (H) and polar (P) 
residues. Proteins as opposed to homopolymers have a 
hydrophobic core resulting from the tendency of shield¬ 
ing the hydrophobic side-chains from the polar (aque¬ 
ous) environment. In the HP model this hydrophobic 
force is (implicitly) mimicked by an attractive inter¬ 
action e that acts between non-bonded neighboring H 
residues (chh = — 1, £hp,pp = 0). Thus, at low tem¬ 
peratures H residues tend to gather in the interior of 
the globular state and form a hydrophobic core while P 
residues are located at the outer shell. Despite its limi¬ 
tations [24[ [25] , the HP model has been widely used to 
describe protein folding qualitatively and to shed new 
light onto some of the most puzzling questions in protein 
science (e. g., Levinthal and blind watchmaker paradox 
ED], chaperonin-mediated protein folding m , mutation- 
induced fold switching m to mention a few). Thus, it 
also serves as a good starting point to address the ques¬ 
tions of knottedness (a fundamental, topological prop¬ 
erty of proteins) at the level of abstraction of the present 
study. While the result of a successful folding process 
along a folding funnel is generally assumed to correspond 
to a free energy minimum EDI ESI, our simulations gener¬ 
ate conformations in the close vicinity of this minimum, 
which are subsequently analyzed with respect to knots 

m- 

In order to address the problem from a statistical point 
of view we need to sample a large ensemble of random 
protein sequences under native conditions (i. e., ground- 
state like). To make sure that lattice effects do not bias 
the statistics, long chains lengths (N > 100) are required 
m- Together, these requirements pose a considerable 
challenge on the computational procedure and, thus, a 
similar systematic study has not been carried out for any 
type of protein model so far. Even for the very simpli¬ 
fied HP model, estimating the ground-state of a specific 
HP sequence has only been possible up to around 100 
monomers with state of the art techniques and computa¬ 
tional power [3T1433] . 

Recently, however, Wiist and Landau [34] proposed an 
efficient Monte Carlo scheme which renders the sampling 
of uncorrelated , low-energy (i. e., “native like”) struc¬ 
tures feasible even for chain lengths up to N = 500. 
The key of their procedure is the combination of Wang- 
Landau (WL) sampling [35] with two non-traditional 
Monte Carlo trial moves, namely pull moves [36] and 
bond-rebridging moves m which complement each other 
extremely well. Their methodology has proven to be very 
powerful in overcoming both the energetic and entropic 
barriers typically encountered when sampling the com¬ 
plex free energy landscape of dense lattice polymers and 


proteins. For details, see [34] . 

For the topological characterization of protein confor¬ 
mations we need to compute so called knot invariants 
which are only unique for closed curves (mathematically, 
knots are only well-defined for closed curves). Thus, for 
linear polymers and proteins the notion of knottedness 
needs to be extended to open chains by choosing a partic¬ 
ular closure which connects the termini in a well-defined 
manner (thus closing the loop) [381440] . It is important 
that the closure itself has no significant influence on the 
calculation of the knot invariants. Even though some am¬ 
biguity remains, different closures typically yield similar 
results from a statistical point of view 0121. In this 
paper we use a rather simple closure which was already 
applied successfully for the determination of knots in real 
proteins: We determine the center of mass of the poly¬ 
mer and draw two lines through the first and the last 
bead. Outside the protein the two lines are connected 
by a straight line. From this structure we compute the 
Alexander polynomial (knot invariant). The numerical 
implementation of the entire procedure is described in 
great detail in jam]. 

Fig- [l] shows the unknotting probabilities for 100 ran¬ 
dom HP sequences and a few designed HP sequences un¬ 
der native conditions. All chains consist of N = 500 
monomers with 50% H and 50% P residues (except for 
the homopolymer with 100% H). This chain length was 
chosen such that the homopolymer already exhibit a sig¬ 
nificant amount of knotting. We have also studied shorter 
chains (down to N m 100) and obtained qualitatively 
similar results even though the overall probability to find 
a knot for shorter chain lengths is, of course, correspond¬ 
ingly smaller. In each simulation the sequence of a chain 
is fixed and does not vary. Thus, we investigate an en¬ 
semble of sequences to show how the introduction of this 
additional degree of freedom per monomer may affect an 
evolutionary system. 

It is worth noting that the HP model exhibits a rather 
large ground-state degeneracy, which could be reduced 
somewhat by adding additional interactions between H 
and P monomers. The degeneracy of the ground-state 
and the additional states in its vicinity allowed us, how¬ 
ever, to determine a ” likelihood” of knottedness for a 
given HP sequence as follows: First, a pre-WL run was 
performed to obtain an estimate of its ground-state en¬ 
ergy. Then, a subsequent production WL simulation, 
restricted to the lowest 20 % of the entire energy range, 
consecutively sampled conformations within 5 % of the 
ground-state energy. (This threshold was set heuristi- 
cally but other values <10% gave similar results). Be¬ 
tween the sampling of any two conformations the random 
walker must always perform a full round trip through 
the specified energy range in order to reduce possible 
structural correlations. Multiple, independent produc¬ 
tion WL simulations were run simultaneously to speed up 
the sampling and further increase the structural diversity 
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FIG. 1. Left panel: Unknotting probabilities of 100 random 
HP sequences ( gray dots ) and selected, designed sequences 
(black dots with error bars ) with N = 500 monomers under 
native conditions; for the latter, representative snapshots of 
native conformations are shown too. The designed sequences 
are: ( upper part, from left to right ) seq. ffl (see Fig. [2|; 
PH(PHPHPHHPHPHHPPHP) 3 iHP; self-avoiding walk 
([HP] 25 o); (P 2 H 2 ) 12 (P 4 H 4 ) 8 (P 2 H 2 ) 12 (P 4 H 4 ) 8 (P 2 H 2 ) 6 PH + 
same sequence in reverse order; (lower part, from left to right ) 
homopolymer (H 50 o); (Hi 0 Pio) 25 ; seq. #2 (see Fig. [5]). Error 
bars of unknotting probabilities for the individual sequences 
have been estimated by a jackknife analysis (because of 
similarity, only shown for the designed sequences). Note that 
the distribution of points on the x-axis is arbitrary. The 
mean unknotting probability of the 100 random sequences 
is 0.460(5) (thin horizontal line). Right panel: Frequency 
distribution of unknotting probabilities of the 100 random 
HP sequences. 


of sampled conformations. Eventually, 1000 conforma¬ 
tions were randomly selected among the entire sample 
and their knottedness analyzed. The unknotting prob¬ 
ability, as displayed in Fig. |TJ is then defined as the 
number of unknotted conformations divided by 1000 [42] . 
The total sampling time for the whole study amounted 
to more than three million CPU hours (AMD Opteron 
6272, 2.1GHz). 

To interpret this probability we can imagine that a 
HP lattice polymer with a given sequence represents a 
large number of possible proteins with the same or a 
very similar sequence of hydrophobic and polar amino 
acids. Small changes in interactions (representing, e. g., 
amino acids with a slightly different degree of hydropho- 
bicity) or slightly different sequences will lead to similar 
knotting behavior. To check this assumption we have 
performed additional simulations in which we randomly 
mutated monomers (while keeping the ratio of H and P 
residues constant) and have confirmed that the likelihood 
of containing knots is indeed similar for mutation frac¬ 
tions up to 4%. Hence, the unknotting probability can 
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FIG. 2. Snapshots of representative native structures. 
From left to right: Designed HP sequence featuring al¬ 
most no knots ([HHPP]i 2 s); random HP sequence with 
a modest degree of knottedness; homopolymer (an exact 
ground-state structure); highly knotted designed HP sequence 
([Pio(HP) 7 Hio(HP) 8 ]5[(PH) 8 Hio(PH) 7 Pio]5). Upper struc¬ 
tures: Monomer type coloring with hydrophobic monomers 
and polar monomers shown in red and blue, respectively. 
Lower structures: Monomer index coloring, with colors gradu¬ 
ally changing from blue (monomers at the beginning of the se¬ 
quence), over white (monomers in the center of the sequence) 
to red (monomers at the end of the sequence). p U nknot denotes 
the unknotting probability of the corresponding sequence. 

be interpreted as an estimate for the unknotted fraction 
of conformational space of proteins represented by this 
sequence. 

Fig- HI clearly illustrates the strong dependence of the 
degree of knottedness on the particular sequence. Even 
for the random sequences the unknotting probability fluc¬ 
tuates between 0.3 and 0.6. Despite this large variability 
in the tendency of individual sequences to form knots, on 
average heteropolymers are almost as knotted as globu¬ 
lar homopolymers. Most remarkably, however, is the fact 
that it is possible to design HP sequences (notably, with 
the same ratio of H and P) featuring almost no knots or 
being almost fully knotted. 

Fig .[2]shows snapshots of typical native state like struc¬ 
tures for a random HP sequence, two designed HP se¬ 
quences, and the homopolymer. Lattice homopolymers 
close to the native state are cubic, but have little local or¬ 
der: Inside the cube the chain goes back and forth leading 
to a rather large degree of knottedness. A typical ran¬ 
dom HP sequence already has a pronounced hydrophobic 
core and tends to be a bit more ordered at the local scale: 
Beads which are only a few monomers apart tend to oc¬ 
cupy the same region in space. This leads to a small 
decrease in the overall knotting probability, but by no 
means explains the large discrepancy between proteins 
and homopolymers. The two designed sequences, which 
are extreme examples with respect to the variability in 
the tendency to form knots, exhibit very pronounced and 
distinct features. The pattern of alternating HH and PP 
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segments in the designed sequence #1, which is almost 
always unknotted, induces a very regular, slab-shaped, 
native structure with H monomers filling the interior of 
the slab and P monomers occupying its border. This 
compact structure results from a distinct local threading 
of the sequence which disfavors entanglements. Alternat¬ 
ing sequences of H 4 P 4 and H 2 P 2 segments have a similar 
effect (cf. snapshot in Fig. [l]), but structures tend to be el¬ 
lipsoidal rather than flat. Another, almost trivial motive 
are simple alternating H and P monomers, which form 
a swollen coil structure akin to a self-avoiding walk (cf. 
snapshot in Fig. [I]). In contrast, a pattern which highly 
favors the formation of knots is presented in the designed 
sequence #2. It consists of long contiguous segments of H 
and P residues separated by segments of repeating (HP) 
motives. This sequence forces the protein to fold back 
through extended loops in order to optimize the number 
of non-bonded HH interactions and there is almost no 
local order inside the hydrophobic core. Both features 
foster entanglements and knots. Again, we need to stress 
that there is no one-to-one correspondence between mo¬ 
tives which enhance or suppress knots in lattice proteins 
and real proteins, which are considerably more complex. 
Indeed, for the latter such motives have not even been 
identified. 

Despite the variation in the degree of knottedness, all 
native like structures of HP sequences exhibit a more or 
less pronounced hydrophobic core. Thus, the formation 
of a hydrophobic core in itself cannot be considered as a 
precursor of suppression of knots in proteins. However, 
the local structure (order) among residues within a se¬ 
quence strongly influences knotting as manifested by the 
index coloring scheme of corresponding structures (see 
lower row of snapshots in Fig. [2J. Whereas in the de¬ 
signed sequence #1 nearby monomers are strongly lo¬ 
calized and form a precursor of secondary structure, in 
the designed sequence ^2 they tend to spread out far 
and in uncorrelated directions. In real proteins individ¬ 
ual elements of secondary structure have the tendency 
to fold back onto themselves which, in turn, introduces 
locality and suppresses knots [18]. It is remarkable that 
the HP sequences studied here show the same relation¬ 
ship between knottedness and local structure despite the 
simplicity of the underlying protein model. 

Finally, we compare the average knotting probability 
of random hetero- and homopolymers as a function of 
solvent quality (i. e., temperature in our model) ranging 
from ground-state like structures, in which knots tend to 
spread over the whole structure, to the denatured case, 
in which they are weakly localized (not shown here). To 
make a fair comparison, we plot the probability of ob¬ 
serving an unknotted structure (or a trefoil knot) as a 
function of the radius of gyration. To be able to define 
an average knotting probability for random heteropoly¬ 
mers, we have again averaged over our 100 random HP 
sequences; Metropolis Monte Carlo sampling (using the 
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FIG. 3. Average probabilities of finding unknots (upper 
curves ) and trefoils (lower curves ), respectively, in homopoly¬ 
mers (100 % H) and random heteropolymers (50 % H, 50 % P) 
with N — 500 as a function of the root mean squared radius 
of gyration, yj{Rg). The red symbols denote corresponding 
probabilities under native condition (ground-state like). Er¬ 
ror bars have been calculated by averaging over independent 
runs; they do not exceed symbol size and are, thus, not shown. 


same move sets as described above) has been employed 
to obtain correctly weighted estimates at finite temper¬ 
atures. Fig. [3] shows that the probability of finding un¬ 
knots or trefoil knots in heteropolymers (averaged over 
random sequences) is quite similar to the one for ho¬ 
mopolymers at comparable densities. However, at high 
densities (low temperatures) the unknotting probability 
of heteropolymers clearly deviates from the decreasing 
trend observed in homopolymers. 

In this study we have been able to demonstrate quan¬ 
titatively that sequence strongly influences (or even de¬ 
termines) the degree of knottedness under native condi¬ 
tions. Within the framework of the minimalist HP pro¬ 
tein model and large scale Monte Carlo simulations, we 
have determined probabilities of knotting for random HP 
sequences as well as homopolymers with 500 residues. 
The introduction of sequence leads to a large variability 
in the self-entanglements of heteropolymers even though 
on average they are almost as knotted as globular ho¬ 
mopolymers of comparable density. We have also been 
able to design sequences which fold into either highly 
knotted or almost knot-less structures. While we demon¬ 
strate that a variation of sequence leads to a variation of 
self-entanglements and knots it is likely that variability in 
other interactions may have similar effects. This shows in 
principle that the introduction of a single additional de¬ 
gree of freedom per monomer, in our case sequence, may 
already suffice to facilitate evolution towards a largely 
unknotted “Protein Universe”. In a sense, proteins are 
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not an equilibrium ensemble of (knotted) random het- 
eropolymers and should as such not be compared to an 
equivalent ensemble of homopolymers, but instead live in 
a very specific conformational subspace in which knots 
are rare. 
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